-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
port udivmoddi4 and __aeabi_uldivmod #25
Conversation
It seems that |
You might want to rebase your implementation on top of #26 and make use of the traits. |
} | ||
|
||
#[no_mangle] | ||
pub unsafe extern "aapcs" fn __aeabi_uldivmod(num: u64, den: u64) -> u64x2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think unsafe is necessary here. We only needed it for the mem* builtins.
Alright, thanks for the input and done.
Just ported that. |
Nice. I'll get to it tomorrow. |
@@ -295,3 +295,64 @@ pub extern "C" fn __udivmoddi4(a: u64, b: u64, rem: *mut u64) -> u64 { | |||
} | |||
q.u64() | |||
} | |||
|
|||
#[no_mangle] | |||
pub extern "C" fn __udivmodsi4(a: u32, b: u32, rem: *mut u32) -> u32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Option<&mut u32>
So, even though 0800013c <__aeabi_uidivmod>:
800013c: b500 push {lr}
800013e: b081 sub sp, #4
8000140: 466a mov r2, sp
8000142: f000 f92b bl 800039c <__udivmodsi4>
8000146: 9900 ldr r1, [sp, #0]
8000148: b001 add sp, #4
800014a: bd00 pop {pc}
0800014c <__aeabi_uldivmod>:
800014c: e92d 4800 stmdb sp!, {fp, lr}
8000150: b084 sub sp, #16
8000152: f10d 0c08 add.w ip, sp, #8
8000156: f8cd c000 str.w ip, [sp]
800015a: f000 f805 bl 8000168 <__udivmoddi4>
800015e: 9a02 ldr r2, [sp, #8]
8000160: 9b03 ldr r3, [sp, #12]
8000162: b004 add sp, #16
8000164: e8bd 8800 ldmia.w sp!, {fp, pc}
08000168 <__udivmoddi4>:
8000168: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
(...) That's with LTO. Without LTO, I get pretty much the same. |
Yes, that's to be expected. Objdumping object files sometimes has issues with symbol resolution. Objdumping an executable should always show the correct symbols that are getting linked. |
@@ -1,3 +1,35 @@ | |||
use core::intrinsics; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment explaining why an asm implementation was necessary (non-standard calling convention) would be nice here.
also rewrite these last two new aeabi intrinsics as naked functions
@Amanieu I think I addressed all your comments. Let me know if I missed anything. |
if let Some(rem) = rem { | ||
*rem = u64::from(n.high() % d.low()); | ||
} | ||
return u64::from(n.high() / d.low()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think we could just go with panic!("Division by zero")
here.
} | ||
|
||
q = (q << 1) | carry; | ||
q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return (q << 1) | carry
directly
I think that covers everything, it should be good to go after those changes. |
Merging (test failures (ARM) are known to be flaky) Thanks @Amanieu for the review! @thejpster I hope you can give this crate another try! Let us know if you need other intrinsics! |
thejpster/stellaris-launchpad@3c901c2 Looks good to me. I reported a failure earlier, but I'd forgotten to |
I don't think you need |
Should |
true |
If I don't
|
Oh, I guess it is required then. My bad. |
Nice! Experienced any significant change in binary size after moving to rustc_builtins? |
I would expect it to be slightly slower since we aren't using the asm implementation of division from compiler-rt and instead use a rust implementation. |
Hard to say as if I build in release mode, I hit a trap on startup. The
Debug binary is obviously quite big.
If I get it going I'll switch back to compiler-rt.rs temporarily and let
you know.
|
I did some measurements on this snippet: #[no_mangle]
pub extern "C" fn start() -> ! {
unsafe {
let x = ptr::read_volatile(0x0 as *const u64);
let y = ptr::read_volatile(0x0 as *const u64);
let z = x / y;
ptr::write_volatile(0x0 as *mut u64, z);
loop {}
}
} compiler-rt.rs (intrinsics in C/asm)
This crate (intrinsics in Rust)
Build using Cargo's release profile with LTO enabled for ARM Cortex-M3. And for reference, this program: #[no_mangle]
pub extern "C" fn start() -> ! {
loop {}
} has this size:
70 bytes bigger doesn't sound that bad. If as @Amanieu says there are other optimizations we could do here then perhaps that can bring down the size on par with compiler-rt's version maybe even make it smaller(!). |
cc @thejpster
The
udivmoddi4
intrinsic is tested using quickcheck.I'm concerned about the
__aeabi_uldivmod
intrinsic. First, I get infinite recursion but that probably can be fixed with the hacks @Amanieu suggested in #16. The other issue is that the assembly of the__aeabi*
intrinsic doesn't look like compiler-rt's assembly implementation.This is the assembly generated by this crate (for the
arm-unknown-linux-gnueabi
target):And this is compiler-rt's implementation:
Even though I implemented the intrinsic in Rust following the compiler-rt's "C implementation", which is written in the comments of the source file:
But I used
u64
becausei64
doesn't really make sense (the__divmoddi4
takes unsigned integers as arguments).@Amanieu Any idea of what's wrong with my
__aeabi_uldivmod
implementation? Should I implement it using assembly + naked functions instead?